Wikipedia Arborification and Stratified Explicit Semantic Analysis
نویسندگان
چکیده
RÉSUMÉ Nous présentons une extension du procédé d’analyse sémantique explicite de Gabrilovich et Markovitch. À l’aide de leur mesure de parenté sémantique, nous pondérons le graphe des catégories de Wikipédia. Puis, nous en extrayons un arbre couvrant minimal par le biais de l’algorithme de Chu-Liu & Edmonds. Nous définissons une notion de tfidf stratifié, les strates étant, pour une page Wikipédia et un terme donnés, le tfidf classique et les tfidfs catégoriels dans les catégories ancêtres, au sens de l’arbre couvrant minimal. Notre méthode se sert de ce tfidf stratifié, qui favorise les termes qui « survivent » lorsque on passe des pages aux catégories, en se dirigeant vers la racine de l’arbre. Nous l’évaluons par une classification de textes tirés du corpus WikiNews, et constatons qu’elle apporte un gain de précision de 18%. Nous terminons par une série de pistes de recherches futures.
منابع مشابه
UPC-CORE: What Can Machine Translation Evaluation Metrics and Wikipedia Do for Estimating Semantic Textual Similarity?
In this paper we discuss our participation to the 2013 Semeval Semantic Textual Similarity task. Our core features include (i) a set of metrics borrowed from automatic machine translation, originally intended to evaluate automatic against reference translations and (ii) an instance of explicit semantic analysis, built upon opening paragraphs of Wikipedia 2010 articles. Our similarity estimator ...
متن کاملNon-Orthogonal Explicit Semantic Analysis
Explicit Semantic Analysis (ESA) utilizes the Wikipedia knowledge base to represent the semantics of a word by a vector where every dimension refers to an explicitly defined concept like a Wikipedia article. ESA inherently assumes that Wikipedia concepts are orthogonal to each other, therefore, it considers that two words are related only if they co-occur in the same articles. However, two word...
متن کاملQuery Expansion Using Wikipedia and Dbpedia
In this paper, we describe our query expansion approach submitted for the Semantic Enrichment task in Cultural Heritage in CLEF (CHiC) 2012. Our approach makes use of an external knowledge base such as Wikipedia and DBpedia. It consists of two major steps, concept candidates generation from knowledge bases and the selection of K-best related concepts. For selecting the K-best concepts, we ranke...
متن کاملWikipedia Link Structure and Text Mining for Semantic Relation Extraction
Wikipedia, a collaborative Wiki-based encyclopedia, has become a huge phenomenon among Internet users. It covers huge number of concepts of various fields such as Arts, Geography, History, Science, Sports and Games. Since it is becoming a database storing all human knowledge, Wikipedia mining is a promising approach that bridges the Semantic Web and the Social Web (a. k. a. Web 2.0). In fact, i...
متن کاملCombining Heterogeneous Knowledge Resources for Improved Distributional Semantic Models
The Explicit Semantic Analysis (ESA) model based on term cooccurrences in Wikipedia has been regarded as state-of-the-art semantic relatedness measure in the recent years. We provide an analysis of the important parameters of ESA using datasets in five different languages. Additionally, we propose the use of ESA with multiple lexical semantic resources thus exploiting multiple evidence of term ...
متن کامل